智能论文笔记

Classify and Generate: Using Classification Latent Space Representations for Image Generations

Saisubramaniam Gopalakrishnan , Pranshu Ranjan Singh , Yasin Yazici , Chuan-Sheng Foo , Vijay Chandrasekhar , ArulMurugan Ambikapathi

分类：机器学习 | (统计)机器学习

2020-04-16

用于下游重建和生成的分类潜空间信息的利用是一种有趣和相对未开发的区域。一般而言，歧视性表现在类特定的特征中，但重建太稀疏，而在AutoEncoders中，表示致密，但具有有限的无法区分的类特征，使它们不太适合分类。在这项工作中，我们提出了一种歧视的建模框架，该框架采用被操纵的监督潜在表示来重建和生成属于给定班级的新样本。与旨在模拟数据歧管分布的GAN和VAE的生成建模方法不同，基于代理（Regene）（Regene）直接表示分类空间中的给定数据歧管。在某些限制下，这种监督表示允许使用适当的解码器进行重建和受控几代，而无需执行任何先前分布。理论上，给定类，我们表明使用凸组合巧妙地操纵这些表示保留相同的类标签。此外，他们还导致了新颖的直接现实图像。关于不同分辨率的数据集的广泛实验表明，Regene在FID方面具有比现有的条件生成模型更高的分类精度。

translated by 谷歌翻译

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Hritik Bansal , Karthik Gopalakrishnan , Saket Dingliwal , Sravan Bodapati , Katrin Kirchhoff , Dan Roth

分类：自然语言处理 | 人工智能

2022-12-18

Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, suggesting that induction heads are among the heads capable of more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained to perform in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.

translated by 谷歌翻译

Bi-Noising Diffusion: Towards Conditional Diffusion Models with Generative Restoration Priors

Kangfu Mei , Nithin Gopalakrishnan Nair , Vishal M. Patel

分类：计算机视觉

2022-12-14

Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of natural images. The delicate conditions gradually enlarge the divergence during each sampling timestep. To address this issue, we introduce a new method that brings the predicted samples to the training data manifold using a pretrained unconditional diffusion model. The unconditional model acts as a regularizer and reduces the divergence introduced by the conditional model at each sampling step. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks. The improvements obtained by our method suggest that the priors can be incorporated as a general plugin for improving conditional diffusion models.

translated by 谷歌翻译

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan , Noah Brown , Justice Carbajal , Yevgen Chebotar , Joseph Dabis , Chelsea Finn , Keerthana Gopalakrishnan , Karol Hausman , Alex Herzog , Jasmine Hsu

分类：机器人 | 人工智能 | 自然语言处理 | 计算机视觉 | 机器学习

2022-12-13

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io

translated by 谷歌翻译

Unite and Conquer: Cross Dataset Multimodal Synthesis using Diffusion Models

Nithin Gopalakrishnan Nair , Wele Gedara Chaminda Bandara , Vishal M. Patel

分类：计算机视觉

2022-12-01

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

translated by 谷歌翻译

pyRDDLGym: From RDDL to Gym Environments

Ayal Taitler , Michael Gimelfarb , Sriram Gopalakrishnan , Martin Mladenov , Xiaotian Liu , Scott Sanner

分类：人工智能

2022-11-11

We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description. The discrete time step evolution of variables in RDDL is described by conditional probability functions, which fits naturally into the Gym step scheme. Furthermore, since RDDL is a lifted description, the modification and scaling up of environments to support multiple entities and different configurations becomes trivial rather than a tedious process prone to errors. We hope that pyRDDLGym will serve as a new wind in the reinforcement learning community by enabling easy and rapid development of benchmarks due to the unique expressive power of RDDL. By providing explicit access to the model in the RDDL description, pyRDDLGym can also facilitate research on hybrid approaches for learning from interaction while leveraging model knowledge. We present the design and built-in examples of pyRDDLGym, and the additions made to the RDDL language that were incorporated into the framework.

translated by 谷歌翻译

Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation

Siddharth Nayak , Kenneth Choi , Wenqi Ding , Sydney Dolan , Karthik Gopalakrishnan , Hamsa Balakrishnan

分类：人工智能 | 机器人

2022-11-03

We consider the problem of multi-agent navigation and collision avoidance when observations are limited to the local neighborhood of each agent. We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. Specifically, InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm. We show that (1) in training, InforMARL has better sample efficiency and performance than baseline approaches, despite using less information, and (2) in testing, it scales well to environments with arbitrary numbers of agents and obstacles.

translated by 谷歌翻译

Open-vocabulary Queryable Scene Representations for Real World Planning

Boyuan Chen , Fei Xia , Brian Ichter , Kanishka Rao , Keerthana Gopalakrishnan , Michael S. Ryoo , Austin Stone , Daniel Kappler

分类：机器人 | 人工智能 | 计算机视觉

2022-09-20

大型语言模型（LLM）从人类的指示中解开了任务计划的新功能。但是，事先尝试将LLMS应用于现实世界的机器人任务受到周围场景中缺乏接地的限制。在本文中，我们开发了NLMAP，这是一个开放式摄影和可查询场景表示，以解决此问题。 NLMAP是一个框架，可以将上下文信息收集到LLM计划者中，从而在生成上下文条件条件计划之前，可以在场景中查看和查询可用的对象。 NLMAP首先使用视觉语言模型（VLM）建立自然语言可查询场景表示。基于LLM的对象建议模块解析指令并提出涉及的对象，以查询场景表示以获取对象可用性和位置。然后，LLM规划师计划提供有关场景的此类信息。 NLMAP允许机器人在没有固定的对象列表或可执行选项的情况下操作，从而使真实的机器人操作无法通过以前的方法实现。项目网站：https：//nlmap-saycan.github.io

translated by 谷歌翻译

NBD-GAP: Non-Blind Image Deblurring Without Clean Target Images

Nithin Gopalakrishnan Nair , Rajeev Yasarla , Vishal M. Patel

分类：计算机视觉

2022-09-20

近年来，基于神经网络的深度恢复方法已实现了最先进的方法，从而导致了各种图像过度的任务。但是，基于深度学习的Deblurring网络的一个主要缺点是，训练需要大量模糊清洁图像对才能实现良好的性能。此外，当测试过程中的模糊图像和模糊内核与训练过程中使用的图像和模糊内核时，深层网络通常无法表现良好。这主要是因为网络参数在培训数据上过度拟合。在这项工作中，我们提出了一种解决这些问题的方法。我们将非盲图像脱毛问题视为一个脱氧问题。为此，我们在一对模糊图像上使用相应的模糊内核进行Wiener过滤。这导致一对具有彩色噪声的图像。因此，造成造成的问题被转化为一个降解问题。然后，我们在不使用明确的清洁目标图像的情况下解决了降解问题。进行了广泛的实验，以表明我们的方法取得了与最先进的非盲人脱毛作品相提并论的结果。

translated by 谷歌翻译

Computing Anti-Derivatives using Deep Neural Networks

D. Chakraborty , S. Gopalakrishnan

分类：机器学习

2022-09-19

本文提出了一种新型算法，以获得使用深神经网络结构的函数的封闭形式的抗衍生物。过去，数学家已经开发了几种数值技术来近似确定的积分的值，但是原始或不确定的积分通常是非元素的。当集成体中有几个参数并且获得的积分是这些参数的函数时，必须需要抗衍生物。没有理论方法可以为任何给定的功能执行此操作。解决此问题的一些现有方法主要基于曲线拟合或无限序列近似，然后在理论上集成。对于高度非线性函数，曲线拟合近似值不准确，并且需要针对每个问题采用不同的方法。另一方面，无限串联方法没有给出封闭形式的解决方案，并且它们的截短形式通常不准确。我们声称，使用所有积分的方法，我们的算法可以将抗衍生物近似于任何必需的准确性。我们已经使用该算法来获得多种功能的抗衍生物，包括非质量和振荡积分。本文还显示了我们方法的应用，以获取椭圆形积分，费米 - 迪拉克积分和累积分布函数的封闭形式表达式，并减少盖金方法的计算时间用于微分方程。

translated by 谷歌翻译